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Abstract — We present a theoretical framework for the analysis 
of privacy and security tradeoffs in secure biometric authentica- 
tion systems. We use this framework to conduct a comparative 
information-theoretic analysis of two biometric systems that are 
based on linear error correction codes, namely fuzzy commitment 
and secure sketches. We derive upper bounds for the probability 
of false rejection (Pfr) and false acceptance (Pfa) for these 
systems. We use mutual information to quantify the information 
leaked about a user's biometric identity, in the scenario where 
one or multiple biometric enrollments of the user are fully 
or partially compromised. We also quantify the probability of 
successful attack (Psa) based on the compromised information. 
Our analysis reveals that fuzzy commitment and secure sketch 
systems have identical Pfr , Pfa , Psa and information leakage, 
but secure sketch systems have lower storage requirements. We 
analyze both single-factor (keyless) and two-factor (key-based) 
variants of secure biometrics, and consider the most general 
scenarios in which a single user may provide noisy biometric 
enrollments at several access control devices, some of which 
may be subsequently compromised by an attacker. Our analysis 
highlights the revocability and reusability properties of key- 
based systems and exposes a subtle design tradeoff between 
reducing information leakage from compromised systems and 
preventing successful attacks on systems whose data have not 
been compromised. 

Index Terms — Biometrics, Fuzzy Commitment, Secure Sketch, 
Revocability, Reusability, Information Leakage, Privacy, Security 



I. Introduction 

Human biometric measurements such as fingerprints, iris 
scans, face images and ECG signals are attractive tools for 
identifying and authenticating users in access control situa- 
tions. Unlike conventional identifying documents, biometrics 
are difficult to forge. Unlike passwords traditionally used for 
access control, they do not have to be remembered. However, 
biometrics also present some new challenges that are not 
encountered in traditional methods. Noise is a characteristic 
feature of all biometric measurements; every measurement is 
slightly different from all others. In access control systems, the 
issue of noise in biometric measurements is currently tackled 
through pattern recognition. Specifically, a measurement of 
the biometric is taken at the time of enrollment and stored 
in a database of enrolled identities. During authentication, the 
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person in question provides a "test" or a "probe" biometric for 
comparison with the stored enrollment biometric. If the probe 
and enrollment biometric are sufficiently close according to a 
similarity metric defined by the pattern recognition algorithm, 
then access is allowed. 

Unfortunately, the standard method described above has a 
serious drawback: an adversary who compromises the device 
gains access to the enrollment biometric. This is a major 
security hazard; the attacker can subsequently use the en- 
rollment biometric to gain repeated access to the system, 
and to any other biometric-based systems in which the user 
has enrolled. This is also a privacy hazard; the attacker has 
gained access to the user's identifying information and can 
henceforth impersonate the user illegally. The seriousness of 
this hazard is greatly increased by the fact that biometrics 
are inherent properties of the human body and cannot be 
revoked and then re -issued like new credit card numbers. To 
mitigate growing concerns about security hazards and identity 
theft, new approaches to biometrics have been studied with 
a three-fold goal. First, the data stored on the access control 
device should provide little or no information about the actual 
biometric. Second, the stored data should not allow an attacker 
to gain unauthorized access to the system or to impersonate the 
identity of a legitimate user successfully. Third, if the user's 
stored data is known to have been compromised, then it should 
be possible to revoke it and issue new stored data. This should 
prevent the adversary from gaining access or stealing the user's 
identity in the future. 

Secure biometric schemes proposed to fulfill the above 
desiderata fall under one of two related categories, viz., 
fuzzy commitment |[T], ||2l, S, lIU, 15] and secure sicetcli 
schemes ||5l, ||6l, Q, lH, In fuzzy commitment a secret 
vector is combined with the user's enrollment biometric via a 
commitment function. The output of the commitment function 
is stored on the access control device. Access control is 
accomplished by means of a decommitment function. The 
decommitment function takes as its inputs the stored data 
and the user's probe biometric and attempts to recover the 
secret vector. If recovery is successful, access is allowed. In 
contrast, in secure sketch the user provides their biometric 
at enrollment and a "sketch" signal is derived and stored 
on the access control device. When combined with a probe 
biometric from the legitimate user, the enrollment biometric 
can be recovered. If the enrollment biometric is recovered 
successfully, then access is allowed. We later discuss how 
to verify the correctness of this recovery or of successful 
decommitment in fuzzy commitment. Linear error correcting 
codes (ECC) are the most widely used tool for constructing 
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both fuzzy commitment schemes 0, [S], ifTOl . ifTTl and secure 
sketch-based schemes lH, |l9l- 

The relationship between secure sketches and fuzzy extrac- 
tors was examined in fSl where it was shown that a secure 
sketch impUes the existence of a fuzzy extractor. In the present 
paper, we analyze explicit ECC-based constructions of fuzzy 
commitment and secure sketch. We study both the security 
and privacy hazards mentioned above. Regarding the former, 
we derive upper bounds on the false rejection rate (FRR) 
and false acceptance rate (FAR) for both types of systems. Ag " 
Regarding the latter, we characterize the privacy leakage as 
the mutual information between the compromised stored data 
and the user's biometric. Further, a smart adversary may be 
able to increase their likelihood of gaining access to a system 
above the FAR if they have access to some partial compromise 
of stored data and condition their attack on that knowledge. 
We term this the probability of a "successful attack" (Psa) 
and quantify it in some situations. Our analysis establishes 
a strong statement of equivalence: secure sketches and fuzzy 
commitment schemes are equivalent in terms of the FRR, FAR, 
information leakage, and Psa- 

There have been many insightful studies of the information 
leakage that occurs when data stored on the access control 
device is compromised fT2l . ifS), |fT3l , lfT4l . An important 
insight is that a useful sketch, i.e, one that correctly authenti- 
cates noisy samples from a legitimate user, must leak some 
information about the underlying biometric |5|. Extending 
this idea, lfT2l considers a generalized challenge-response 
setting in which a strong adversary examines sketches from 
several chosen perturbations of the challenger's biometric, 
until the biometric has been guessed completely. We consider a 
different scenario in which an adversary compromises a chosen 
subset of the available access control devices and, knowing the 
error correcting codes associated with each, attempts to attack 
the user's system. We think that this problem formulation is 
more reflective of emerging networks of biometric systems. 
Further, it raises many interesting challenges, e.g., we may ask 
how to choose the perturbations or error correcting codes so 
as to leak the least information about the user's biometric. In 
this sense, our work is related to the privacy analysis of |fT3l , 
where the authors consider a sketch indistinguishability game 
and sketch irreversibility game and give conditions on the ECC 
design that minimizes the adversary's advantage. We note that, 
in the analyses of Q, ifTSll . the emphasis is on infor- 
mation leakage about the user's biometric as the adversary's 
prime objective. In practice, however, the adversary may have 
a second objective, namely to compromise some devices and 
use the information gained to login to other devices. It may 
not be necessary to discover the user's biometric. Our analysis 
reveals a subtle conflict between reducing information leakage 
from compromised systems and preventing successful attacks 
on systems whose data have not been compromised. 

A different, but related, line of work focuses on the problem 
of secret key agreement via public discussion ifTSl . lfT6l . ifTTl . 
ESI, lfT9l . In this problem two parties hold correlated pieces 
of information and desire to generate matching secret keys 
through a public discussion. However, an eavesdropper who 
taps into the public discussion should learn nothing about the system. 
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Fig. 1. Noisy measurements Ai, .... of a user's underlying biometric 
Aq are encoded at each access control device to generate authentication data, 
which is stored in the device, and a secret key. Our goal is to analyze the 
tradeoffs between authentication performance and information leakage from 
compromised stored authentication data and secret keys. 



the keys. Of interest in this line of work is the fundamental 
asymptotic tradeoff between the secret key rate (security) and 
biometric information leakage (privacy). Secret key agreement 
by itself does not form a biometric authentication system but it 
can be used to construct one. In contrast, we explicitly analyze 
the fundamental non-asymptotic privacy-secmity tradeoff in 
biometric systems that are based on linear ECCs and explicitly 
relate them to ECC-design parameters. 

The remainder of this paper is organized as follows: Sec- 
tion [ll] describes a general framework for analyzing secure bio- 
metrics and defines the metrics by which security and privac}{^ 
are evaluated. In Section III we describe how to realize fuzzy 



commitment and secure sketch schemes using linear ECCs. 
We show the equivalence between the realizations of fuzzy 
commitment and secure sketch in terms of their security and 
privacy metrics. In Section |IV[ we expand our attention to 
include multiple devices. We derive the information leakage 
for attack scenarios in which an adversary compromises the 
stored data and/or secret keys of multiple devices. We show 
how the information leakage depends on the ECCs used at 
the devices. We characterize how the selection of the ECCs 
affects the probability that the adversary can use information 
gained from the compromised devices to successfully attack 
(i.e., gain access to) uncompromised devices, and how this 
objective conflicts with the aim of minimizing information 
leaked about the user's biometric. Section |V] concludes the 
paper. 



'in this work, compromising privacy refers to leaking information about 
the user's biometric, while compromising security refers to gaining access to 
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II. A Generalized Secure Biometrics Framework 

Consider the scenario with several access control devices 
shown in Fig. [T] A user has a biometric Aq given by 
nature. He enrolls at several access control devices using noisy 
measurements Aj of the underlying biometric Ag. From each 
measurement A^, encoded data is extracted and stored on 
the respective device to aid in authentication. Optionally, a 
secret key or password is provided to the user. A legitimate 
user should be able to gain access to any of the devices by 
providing a probe biometric that is again a noisy measurement 
of the underlying Ag. Any analysis of the privacy and security 
tradeoffs in secure biometrics must take into account not 
only the authentication performance but also the information 
leakage when the stored data and/or keys for one or more 
devices are compromised. 

With the above motivation, we start by presenting an 
abstract model of a secure biometric system for a single 
access control device in Section [Tl-A| We then describe design 
objectives in terms of the system's performance metrics in 
Section HFB] 

A. Model of a Secure Biometric System 
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Fig. 2. Generalized model of a secure biometric system for a single access 
control device. This model encompasses both fuzzy commitment-based and 
secure sketch-based realizations that are described and analyzed in Section [nT] 
For keyless realizations, K is null. For two-factor realizations, K is a secret 
key output by the randomized encoding function. Given the probe biometric 
and, in two-factor realizations, a secret key, the decoder solves a hypothesis 
testing problem to determine if the user is genuine or an impostor. 

Figure |2] depicts a generalized model of a secure biometric 
system for a single access control device. The system consists 
of encoding and decoding modules that manipulate features 
extracted from measurements of human biometrics. In biomet- 
rics parlance the terms "biometric", "biometric measurement", 
and "biometric feature vector", have different meanings. A 
fingerprint, iris, or a face is a biometric, the measurement of 
which produces a digitized image from v^hich features are ex- 
tracted for authentication or recognition. However, for brevity 
of exposition, we interchangeably use the terms "biometric" 
and "biometric measurement" to denote a biometric feature 
vector. We make the additional simplifying assumption that 
all feature vectors and secret keys are length-n sequences 
of binary numbers. The generaUzation to non-binary finite 
alphabets is straightforward. 

Biometric Measurement Model: The process of measuring 
a biometric, extracting suitable feature vectors, and converting 



them to length-n binary sequences is inherently prone to sens- 
ing uncertainty, e.g., in the orientation, size, and illumination 
of an iris or a face, as well as noise in the sensing elements. 
Since we are interested in scenarios where a user can enroll 
the same biometric at multiple access control devices (see 



Section IV-Ai, we posit an underlying "ground truth" length- 
n binary biometric feature vector Ag (Ag.i, . . . , ^g „) 
whose components have an i.i.d. Bernoulli(0.5) distributionr] 
We need to model the combined effect of a measurement 
followed by the extraction of a length-71 binarized feature 
vector (or, for brevity, the biometric measurement). We model 
this as component-wise modulo-two addition of Ag with a 
length-n i.i.d. Bernoulli "noise" sequence. The noise sequence 
is assumed to be independent of the ground truth and any 
previous and future measurement noise sequences. In the 
language of information theory, the biometric measurement is 
the output of a "binary symmetric channel" (BSC), where the 
channel input is Ag. Thus, at enrollment, the user provides 
an enrollment biometric measurement A := (y4i,...,A„) 
which is the output of a BSC with crossover probability pi 
and channel input Ag. Similarly, at authentication, the user 
provides a probe biometric measurement B := (Si, . . . , 
which is the output of a BSC with crossover probability a and 
channel input Ag. This second probe measurement is used by 
the decoding module of the access control device to verify the 
user's identity. We further assume that pi G [0,0.5) and a £ 
[0, 0.5), i.e., it is more likely that coordinates of the biometric 
measurement and probe measurement match than that they 
do not. To see the statistical dependency between A and B, 
observe that Ag, A and B are all i.i.d Bernoulli-0.5 sequences. 
This, along with the BSC channel dependency explained 
above, means that A and B are, in turn, related by a BSC 
with crossover probability p = pi*a = pi{l — a) + (1 — pi)a. 

Enrollment: The (potentially randomized) encoding func- 
tion F{-) takes the enrollment biometric A as input and 
produces as outputs S e 5, \S\ < 00, which is stored on 
the access control device. Optionally, a key vector K g /C, 
|/C| < 00, which is returned to the user, is also produced. 
Thus, (S,K) = F{A). The encoding function is governed 
by the conditional distribution Ps,k|a- Depending upon the 
physical realization of the system, the user may be required 
to carry the key K on a smart card. Such systems are called 
two-factor systems because both the key and the stored data 
are needed for authentication. Systems where K is null are 
called keyless systems; they do not require the use of a smart 
card. 

Authentication: To perform biometric authentication, a 
legitimate user provides the probe biometric B and the key 
K. An adversary, on the other hand, provides a stolen or 
artificially synthesized biometric C and a stolen or artificially 
synthesized key J. The presence of the legitimate user or the 
adversary is indicated by the unknown binary parameter 9. Let 
(D, L) denote the (biometric, key) pair that is provided during 

-Binarized feature vectors extracted from biometric measurements are, 
in general, neither independent nor identically distributed. It is, however, 
possible to design feature transformation algorithms that can convert them 
into binary feature vectors the statistics of which are quite close to those of 
i.i.d. Bernoulli(0.5) bits (g). 
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the authentication step. We write 
(D,L) 




The authentication decision is computed by the decoding func- 
tion as 6* = g(D,L,S). In keyless systems, the procedure is 
similar with K, J, and L removed from the above description. 



B. Performance Metrics 



We now define metrics used to evaluate the performance 
of the secure biometric system of Fig. |2] For example, it is 
necessary to quantify how reliably the system authenticates a 
genuine user and rejects an impostor, to quantify how much 
information is leaked about the underlying biometric when the 
stored data and/or the secret key are compromised, and so on. 

1) Probability of Missed Detection: This quantity is also 
called the False Rejection Rate (FRR), defined as 

Pfr := Pr [e = Q\e = l] - Pr [^(B, K, S) - O] . 

The PpR depends only on the known statistics of 
(A, B, K) and the specification of the system, F{-) and 
g{-). A low value of Pfr indicates that the system reli- 
ably authenticates a genuine user. Thus Prr, quantifies 
the accuracy of the biometric system. 

2) Probability of False Detection: A baseline probability 
of false detection, also called the False Acceptance Rate 
(FAR) is the worst-case probability of false detection 
across all attack vectors and keys that can be generated 
without any knowledge of the ground truth or of any 
measurements, keys, or stored data. It is defined as 

Pfa ■■= maxPr = l|6l = Ol 

PC, J 

= maxPr fg(C, J,S) = ll , 

PC, J 

where (C, J) is independent of (Ao, A, B, K, S). A low 
value of Pfa indicates that the system reliably prevents 
impostors from gaining access to the system by pure 
chance. Thus Pfa quantifies one aspect of the security 
of the biometric system. Typically, a system designer 
is faced with choosing an appropriate tradeoff between 
Pfa and Pfr. 

3) Privacy Leakage: We measure the information leaked 
about the enrollment biometric A (respectively the 
ground truth Ao) in various scenarios of data exposure. 
These include when either the stored data S, the secret 
key K, or both are compromised. We characterize the 
various scenarios using the following mutual infor- 
mation quantities: /(A; S), /(A;K), and /(A;S,K) 
(respectively /(Ao;S), /(Ao;K), and /(Ao;S,K)). 
These are information-theoretic measures of indepen- 
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4) Probability of Successful Attack: In the event of 
data exposure, the probability of false detection could 
increase beyond the nominal value of Pra- In addition 
to exposure of the stored data S and the secret key 
K, mentioned above, we may also need to consider 
scenarios where an adversary coercively gains access to 
A as well. We need to capture the possibility that the 
attacker's biometric-key pair (C, J) is generated using 
knowledge of the compromised data V C {A, S, K}. 
We denote by Psa the probability of false detection in 
such situations, defined as 

Psa{V) :== max Pr [9 = l\e = O] 

PC,J| V 

= max Pr [g(C, J,S) = ll . 

PC,J|V 

We refer to PsAiy) as the "Successful Attack Rate" 
(SAR) to distinguish it from Pfa- The SAR captures 
the probability of false detection when an adversary's 
attack is aided by knowledge of V. We note, in passing, 
that in any keyless or two-factor system, knowledge of 
the stored data S can drastically improve the ability 
of the adversary to gain access, thus compromising the 
security of the system. We will characterize this effect in 
Theorem [2] Ideally, in two-factor systems, if an attacker 
has knowledge of only one factor — i.e., either the 
enrollment biometric A or the key K, but not both — 
they will not be able to use that information to improve 
their ability to authenticate falsely. This motivates the 
following definition. We say that a system is two-factor 
secure if Psa (A) = Psa (K) ^ Pfa- 

5) Storage Requirements: Lastly, the system data storage 
requirement is given by the minimum number of bits 
needed to represent S. This is not more than logj |5| 
bits. The key length requirement is given by the min- 
imum number of bits needed to represent K, which is 
not more than logj |/C| bits. 

III. System Constructions 

In this section, we discuss a single access control device 
in isolation, and analyze system privacy and security. We 
describe two types of systems, the first is a fuzzy commitment 
system and the second is a secure sketch system; for both, we 
assume an implementation based on linear error correcting 
codes. We detail both keyless and keyed (two-factor) variants. 
The linear error correcting code construction allows us to 
demonstrate a number of performance-equivalence properties 
between fuzzy commitment and secure sketch systems. Con- 
siderations of privacy and security for a network of access 
control devices is deferred to Sec. II VI 

^Mutual information between two sets of quantities is always non-negative 
and is equal to zero if, and only if, the two sets are independent (20|. 
Furthermore, we can always write the mutual information between two 
random quantities X and Y as /(X;Y) = //(X) - i?(X|Y) where 
H{-) and H(-\-) are, respectively, the entropy and conditional entropy of 
the argument(s). Thus, mutual information characterizes the reduction in 
uncertainty about one random quantity, X, when given knowledge of another, 
Y. 
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A. Fuzzy Commitment Systems based on ECC 

A fuzzy commitment scheme binds a random vector to an 
enrollment biometric A to produce a length-n stored data 
vector S. This is diagrammed in Fig. [3] for the case of a two- 
factor (keyed) system. The keyless variant, shown in Fig. |4] 
is the special case where the smart card key K and decoding 
key L are both the all-zero sequence. Note that both systems 
fit within the general framework of Fig. [2] 

We exclusively consider fuzzy commitment schemes 
wherein the random vector corresponds to a uniformly selected 
codeword of a binary [n, k] linear error correcting code. We 
use G to denote the code's k x n generator matrix and H to 
denote the code's mxn parity check matrix with m — n — k. 

Enrollment: The enrollment procedure first generates two 
independent i.i.d. Bernoulli (0.5) sequences, the key sequence 
K := {Ki,...,Kn) and the auxiliary sequence, Z := 
{Zi, . . . , Zk). The auxiliary sequence Z selects a codeword 
G^Z uniformly from the set of all codewords of the linear 
error correction code with generator matrix G. The codeword 
is then additively perturbed by the enrollment biometric A and 
the result is additively masked by the randomly generated key 
sequence K to produce the stored data S: 

s = AeG^zeK. 

Authentication: At authentication, the system has access to 
the stored data S and is presented with the pair (D,L). The 
authentication procedure consists of two steps. First, syndrome 
decoding is performed to produce an estimate W of the error 
vector A D as follows: 

W= argmin rf(W), 

W:HW=H(DffiLffiS) 

where d{-) is the Hamming weight. If L = K, the masking 
effect of the key is canceled out and the syndrome decoding 
procedure is then operationally equivalent to the optimal 
channel decoding of the codeword G^Z when corrupted by 
A © D. Second, given W, an estimate 6* of is made as 



d(W) 



9=1 

^ Tn. 

9=0 



(1) 



If 9 = 1 the decision is made that the biometric A and the 
probe D are close enough (the estimate of this distance is the 
weight of W) that access should be granted. 

We make the following assumptions about system operating 
parameters. Recall that if L = K the decoding process is the 
same as optimal channel decoding. This implies that if the 
rate of the error correcting code (specified by the choice of 
H) is below the channel capacity of the binary symmetric 
channel (BSC) with crossover probability r, BSC(t), then the 
the estimate W will equal A D with high probability. Our 
first assumption is thus that the rate R = k/n of the code G 
satisfies 

R = k/n < 1 - hbir), 

where 1 — /i6(t) is the BSC(r) channel capacity and ht,{p) := 
— plogjp— (1 — p) log2(l —p) is the binary entropy function. 
Second, we require r to be larger than p but smaller than 0.5, 
i.e., 0.5 > T > p. Recall that p is the noise parameter of the 
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Fig. 3. A two-factor fuzzy commitment system stores the bitwise XOR of a 
randomly generated codeword of a linear error correcting code, tlie enrollment 
biometric, and a randomly generated secret key. 
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Fig. 4. A keyless fuzzy commitment system stores the bitwise XOR of 
a randomly generated codeword of a linear error correcting code and the 
enrollment biometric. 



probe channel, the BSC(p). With this relation between p and 
T we write 



R^k/n< l-hb{T) < 1- 
or, equivalently, 

TTl 

— > hbir) > hb{p). 



hbip), 



In many practical reaUzations of fuzzy commitment the 
threshold test ([TJ is replaced with a hash check. Namely, in 
order to verify whether the random vector G^Z has been 
recovered exactly, a cryptographic hash of G-'^Z (alternately 
of Z) is also stored at the access control device. This stored 
hash must match the hash of the D ® L ® S W for 
access to be granted. However, cryptographic hashes are not 
information theoretically secure, they are only computationally 
secure. Since our focus is on information theoretic security, 
a cryptographic hash cannot be used as part of our system. 
Thus, in the systems analyzed in this work, we do not 
use cryptographic hashes and, instead, rely on the threshold 
test ([T). 

B. Secure Sketch Systems based on ECC 

We now introduce the second family of biometric storage 
systems studied, called secure sketch systems. While, as was 
the case for fuzzy commitment, there are other ways to 
develop a secure sketch, we concentrate on secure sketches 
implemented using linear error correcting codes. The baseline 
two-factor secure sketch scheme is diagrammed in Fig. [5] 
and the keyless variant in Fig. |6] Following the notation of 
Sec. III-A| we denote by H the mxn parity check matrix of 
a binary [n, k] linear error correcting code with m = n — k. 

Enrollment: The enrollment procedure first generates the 
key sequence K :— {Ki, . . . , Km) as an independent i.i.d. 
Bemoulli(0.5) sequence. The stored data S is the length-m 
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Fig. 5. A two-factor secure sketch system stores the bitwise XOR of the 
syndrome vector of a linear error correcting code generated by the enrollment 
biometric and a randomly generated secret key. 
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Fig. 6. A keyless secure sketch system stores the syndrome vector of a linear 
error correcting code generated by the enrollment biometric. 



syndrome HA of enrollment biometric feature vector masked 
by the key, 

S = HA® K. 

Authentication: The authentication procedure performs 
syndrome decoding to produce an estimate W of A D as 

W= arg min (i(W). 

W:HW=HDeLeS 

The authentication decision is made using threshold test 



d(W) ^ rn. 

9=0 



The assumptions on the values of t and the coding rate R of 
the ECC are identical to those made in Sec. 111-A In practical 
implementations, cryptographic hashes are often also stored 
and used to verify the correctness of the syndrome decoding 
procedure. However, for the reasons already discussed in the 
context of fuzzy commitment, we do not employ cryptographic 
hashes in our analysis. 

C. Equivalence of Fuzzy Commitment and Secure Sketch 

We now develop an equivalence between the properties of 
the fuzzy commitment and secure sketch schemes presented in 
the previous two subsections. We show the conceptual equiv- 
alence between the two architectures and derive expressions 



for the performance metrics defined in Section 11-B showing 
that the performance is the same. 

Reviewing the decoding procedures of fuzzy commitment 
and secure sketch one sees that the procedures are nearly iden- 
tical. The authentication decision is determined by whether 
or not W, the lowest Hamming weight sequence in a given 
coset, has Hamming weight greater or less than rn. The coset 
is specified by its syndrome and the only difference between 
the systems is how this syndrome is computed. 



In the two-factor secure sketch system, the syndrome is 
specified as 

g^,,(D,L,S) =HD©L©S 

= H(A ® D) ® K ® L (2) 

In the two-factor fuzzy commitment system, the syndrome is 
specified as 

g^^(D,L,S) =H(D®L® S) 

= H(A ® D) ® HG^Z ® H(K ® L) 
= H(A®D)®H(K®L) (3) 

The decision for ^ is a deterministic function of the syndrome, 
defined identically for both systems. 

In both systems, during the authentication of the legitimate 
user, where D = B and L = K, the computed syndrome 
is identical and equal to H(A ® B). Note that this is true 
of both keyed and keyless variants of the systems. Thus, the 
distribution of 9 given 6* = 1 is identical for both types of 
systems and thus the FRR is identical. 

In determining the FAR - the case of an attack by an 
uninformed adversary - the input vectors (D, L) = (C. J) can 
have an arbitrary joint distribution, but must be independent 
of the pair (A, K). Regardless of the distribution of (C,J), 
the syndrome in both systems is i.i.d. Bernoulli(0.5), since A 
is assumed to be an independent i.i.d. Bernoulli(0.5) sequence 
and H has full row rank (cf. Lemma [T] below). Since the 
syndromes are equal in distribution for both systems, the 
authentication decisions 9 are also equal in distribution for 
both systems, and hence the FAR performance is the same. 

Determining the SAR of these systems requires considera- 
tion of scenarios when the adversary has access to A, S, and/or 
K. In contrast to the scenario considered for the FAR analysis, 
the availability of this additional information may allow the 
adversary to alter the distribution of the decoding syndrome. 
However, as we will see in Theorem [T| below, the SAR for 
secure sketch and fuzzy commitment is also the same. 

Before we proceed, consider the following result that will 
be useful in understanding and proving some of the theorems 
that follow: 

Lemma 1 Let A. be a length-n i.i.d. Bemoulli-{0.5) random 
vector and let H and H be, respectively, mxn and fh x nfull 
row-rank binary matrices whose rows are linearly independent 
of each other Then, for any pair of binary vectors, s and s, 
of lengths m and m respectively, Pr[HA = s|HA = s] = 
Pr[HA = s] = 2~™. 

The proof of this lemma appears in Appendix |A] Note 
that, since the channel codes are assumed to operate at a 
rate R = k/n which is below capacity they have a positive 
error exponent E{R) > 0. This means that the probability of 
decoding error when using these codes on a BSC-p is bounded 
as 

where E{R) = Ta\i\q{D{q\\p) + max{l - hb{q) - i?, 0}) 
and the KL divergence between two Bernoulli distributions, 
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Bemoulli((7) and Bernoulli(p) is defined as 

D{q\\p) := glogj - + (1 - q)log2 ^ -. 

It is well known that, for sufficiently large n, there exist code 
constructions that satisfy these properties ll20l . 

Theorem 1 The FRR and FAR of both keyed and keyless 
variants of fuzzy commitment and secure sketch is the same 
and is bounded as 

(i) PpR < 2~"-°('^IIP' + 2^"^(^)+°("), 

(ii) Pfa < 2""( " -'"'(^'). 

The SAR of the two-factor (keyed) fuzzy commitment and 
secure sketch schemes for various cases of data exposure are 
identical and given by 

(iii) Psa{^)=Pfa, 

(iv) Psa{A) = Pfa, 

(V) P5a(S) - P5a(A,K) = Psa(A,S) = P5a(S,K) - 
P5a(A,S,K) - 1. 
The SAR of the keyless fuzzy commitment and secure sketch 
schemes for various cases of data exposure are identical and 
given by 

(vi) Psa{^) = -P5a(A) = Psa(A, S) = 1. 

Please refer to Appendix |B] for the proof of the theorem. 
In parts (i) and (ii) the theorem characterizes exponentially 
decaying upper bounds on the FRR and FAR, and hence 
also lower bounds on the exponents. In order to obtain these 
exponentially decaying bounds, the operating parameters must 
satisfy the previously listed assumptions, that is, 0.5 > t > p 
and m/n > hb{T). Note that for all of our systems, knowledge 
of the stored data S drastically improves the ability of the 
adversary to gain access. For all of our systems, the SAR is 
equal to one for an adversary enhanced with the knowledge of 
S, cf. parts (v) and (vi) above. This is because, as is formalized 
in the proof, an adversary with knowledge of S can gain 
access by choosing (C, J) based on knowledge of S so that 
the decoding coset contains a low-weight error sequence with 
probability one. In fact, this limitation is not unique to ECC- 
based systems as the following theorem shows. 

Theorem 2 For any two-factor system, 

(i) -Psa(S) >1-Pfb.- 

If for every S e 5, there exist D, L such that giTi, L, S) = 1, 
then 

(ii) Psa(S) = 1. 

The proof appears in Appendix [C] 

Fuzzy commitment and secure sketch also have identical 
privacy leakage as demonstrated by the following theorem. 

Theorem 3 In the two-factor fuzzy commitment and secure 
sketch systems, the privacy leakage of A from S, from K, or 
from (S,K) is, respectively, 

(i) /(A;K) = 0, 

(ii) /(A;S) = 0, 

(iii) /(A; S, K) = TO = n{l - R) > 0. 



In the keyless variant of fuzzy commitment and secure sketch 
the privacy leakage of A from S is 
(iv) /(A; S) = TO = n(l - i?) > 0. 

The proof of this theorem is given in Appendix [D] From an 
authentication perspective, it is interesting that the additional 
independent source of randomness Z in fuzzy commitment 
based systems does not improve the privacy leakage properties 
in comparison to secure sketch based systems where such 
randomness is unavailable. 

The fuzzy commitment and secure sketch systems are 
equivalent in terms of many performance metrics but they 
differ in terms of storage and key length requirements. The 
fuzzy commitment system requires n bits to store the data 
since iJ(S) = n. It also uses an n-bit key to mask the stored 
data in the two-factor variant. On the other hand, secure sketch 
system requires only to bits for storage since -ff(S) = m 
due to the fact that only the syndrome of A is being stored. 
Similarly, it also uses only an ?7i-bit key to mask the stored 
data in the two-factor variant. 

IV. Linkage Resistance and Revocability 
Properties 

In this section we consider two desirable properties for 
secure biometrics - revocability and resistance to linkage 
attacks - and study them in the context of noisy enrollments 
at multiple access control devices. We will only consider 
two-factor systems in this section. Although the results to 
be presented in this section apply equally to both secure 
sketch and fuzzy commitment based systems, proofs will be 
provided only for secure-sketch based systems since the two 
types of systems are performance-equivalent as discussed in 
Section HITCl 

Revocability is the ability to tolerate partial compromises 
of data. By partial compromise we mean that, in a two-factor 
access control system, either the key or the stored data has 
been revealed to the adversary, but not both. On the other hand, 
we say that a two-factor system is fully compromised if both 
the key and the stored data have been revealed to the adversary. 
A secure biometric is said to be revocable if, given knowledge 
of a partial compromise, the user or a system administrator can 
delete certain data and establish a new enrollment based on the 
same biometric without any loss in privacy or authentication 
performance. 

Linkage attacks can occur in situations where the same 
biometric is used to enroll in multiple biometric systems, e.g., 
on several access control devices. If an adversary compromises 
a subset of the devices, the compromised data can be used to 
attack the remaining devices. The compromised data can both 
leak information about the underlying biometric and can be 
exploited to mount a successful attack, i.e., gain unauthorized 
access to, one of the remaining devices. 

A. Performance Measures for Multiple Biometric Systems 

We now present our model for parallel enrollment across 
multiple biometric systems. We assume that the biometric 
in question has been enrolled in u systems. Each of the u 
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System 


Keyless 


Two-factor 


Fuzzy Commitment 


Secure Sketch 


Fuzzy Commitment 


Secure Sketch 


False Rejection Rate 




False Acceptance Rate 


Pfa < 2^"( " "''"f^)) 


Successful Attack Rate 


PsAiA) = Psa{S) = 1 


PsAiA) = PsAiK) ^ Pfa 
Psa{S)^Psa{A,K)^1 


Privacy Leakage 


/(A; S) = 771 


/(A;K) =0, /(A;S,K) =777 
/(A;S) = 


Storage Requirements 


H{S) = n 


H{S) = m 


H{S) = H{K) = n 


H{S) = H{K) = m 



TABLE I 

Summary and Comparison of System Performance 



biometric systems has an enrollment vector. These vectors, 
Ai, i G {1, . . . , u}, are related in a conditionally independent 
manner to a common underlying biometric Ao according to 
the measurement model in Section |ll] In other words, 

PA.|A„(a.|a) = (1 _p^)"-'i«(a.a)pdH(a.a) 

where pi e [0,0.5), all vectors are binary and dni-,-) is 
the Hamming distance between its arguments. For conve- 
nience, we define po = 0. Encoding and decoding functions 
{Fi{-) , gi{-)}'^^i are paired and need not be identical for all 
systems. At enrollment, each system i E {1, . . . ,u} observes 
Ai, and the stored data and key for system i are generated as 
(Si, Ki) — Fi{Ai). The joint distribution across the u systems 
is given by 



Ps",K",Ao(s",k",a) = PAo(a) [] ^s..k.|Ao(sz, k,|a), 



i=l 



(4) 



where 



f's.,K.|Ao(Si,ki|a) =^Pr [P^ia^) = (s,;, k,)] PAi|Ao (ai|a), 

and S" and K" are respectively the u-tuples of stored data 
and key vectors. 

Recall from the discussion of Sec. |ll] (cf. Fig. |2]l that 
the legitimate user of system i will try to authenticate us- 
ing (B,Ki) while an adversary will use some (C,J). The 
crossover probability of system-j's probe channel will be 
denoted by € [0, 0.5). The FRR and FAR are, respectively, 
given by 



PpRii) 



Pr [g,(B,K„S,) =0], 



PpAii) ■■= maxPr[g,(C,J,S,) = 1], 

PC, J 

which are the same as the definitions for a single system in 
isolation. 

In contrast, the existence of multiple systems necessitates 
the generalization of the definition of SAR, in order to 
account for compromises across multiple biometric systems. 
Expanding upon the framework of Sec. |Il] we define V to 



be a subset of {Si, Ki, S2, K2, . . . , S„, K„}. Equivalently 
we write V = U^^^Vi where C {Si,Ki}, possibly the 
empty set. Also, to be able to study the effect of compromised 
enrollment biometrics, we define the set ^ to be a subset of 
{Ao, Ai, A2, . . . , A„}. 

Given knowledge of V and A by an adversary, the SAR 
against system i is 

PsA{i,V,A)^ max Pr [(?,(C, J, S,) = l] . 

B. Privacy Leakage Across Multiple Systems 

In this section we give a tight characterization of the 
privacy leakage, i.e., the amount of information leaked about 
the user's biometric when some subset of the stored data is 
compromised. In the analysis that follows, we assume that all u 
biometric systems are secure sketch-based systems with parity 
check matrices Hi, . . . , H„ which may have different row- 
sizes but the same column-size. As we have already proved the 
equivalence between secure sketch and fuzzy commitment in 



Section III the results derived for multiple secure sketch-based 
systems immediately extend to multiple fuzzy commitment- 
based biometric systems. In other words, statements about the 
parity check matrices Hj can be appropriately modified into 
similar statements about the generator matrices used in 
fuzzy commitment-based systems. 

While deriving the privacy leakage, we also state sim- 
plifications for a number of interesting special cases. In 
particular we consider both the "noiseless" enrollment case 
where Aq = Ai = . . . = A„ and the "identical" enrollment 
function case where all systems use the same EGG, i.e.. 
Hi = . . . — H„. We also write rank(Hi, . . . , Hj) to denote 
the rank of [Hf,...,Hj]. 

Our main result connects the amount of information leakage 
with an easily-characterized rank property of the parity check 
matrices of the compromised systems. 

Theorem 4 Given the enrollment model of Q, assume, with- 
out loss of generality, an ordering of the systems such that for 
some index I, < I < u, Vi — {Si,Ki} /or all i € and 
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Vi C {Si,Ki}/or all i > I. Then, the information about Ao 
leaked by V ^ Uf^^Vi is 

, , J if 1 = 

^ °' ^ " 1 /(Ao;HiAi,...,HjAO else 

Additionally: 

(i) /« general, 

/(Ao;V) <rank(Hi,...,Hi). 

(ii) For noiseless, non-identical enrollment functions, 

/(Ao;V) =rank(Hi,...,HO. 

while for the identical enrollment function case with I > 
1, we have 

/(Ao;V) =rank(Hi). 

The proof of this theorem is given in Appendix |E] Im- 
portantly, this theorem tells us that information about the 
underlying biometric is leaked only if there is at least one 
fully compromised system (i.e., I > 0). Hence, unless both 
the key and stored data of a particular system have been 
compromised, that system can be revoked by erasing the un- 
compromised data (e.g., the key if the stored data has been 
leaked). The theorem indicates that biometric measurement 
noise can only help mask the private data. To see this, consider 
the case I > and note that if the enrollment noise is high 
enough, the information between Aq and HiAi, . . . H;A; can 
be quite small, certainly smaller than when there is no enroll- 
ment noise. This last statement follows from the information 
processing inequality which tells us that the privacy leakage 
when enrollments are noisy is upper bounded by the privacy 
leakage when enrollments are noiseless. 

Part (i) also tells us that the privacy leakage depends on 
the rank of the matrix formed by stacking the parity check 
matrices of the fully compromised systems. We term this the 
"collective" rank of the set in question. The collective rank 
is at most equal to the sum of the ranks of the individual 
parity check matrices and will be strictly less if there is linear 
dependence between the rows of the matrices. Further, as part 
(ii) tells us, in the special case of noiseless enrollments we 
can make an exact statement about privacy leakage in terms 
of collective rank. Finally, in the special case of noiseless 
enrollments and identical enrollment functions, the first fully 
compromised system leaks all the information there is to be 
leaked about the underlying biometric. 

We can sketch a candidate design rule arising from these 
results. To obtain a set of systems that minimize the privacy 
leakage in the face of the compromise of some subset of the 
stored data and keys, the collection of parity check matrices 
should be designed to minimize the linear dependencies across 
the matrices. Of course, at the same time the matrices must 
individually specify good error correcting codes, else the false 
rejection rate would be too high. However, such minimal 
privacy leakage comes at a cost. Further, to achieve minimum 
collective rank, one should simply use the same parity check 
matrix for each system. However, as we discuss in the next 
subsection, this choice makes the remaining uncompromised 
systems more vulnerable to false authentications. Thus, if we 



design the multiple systems to minimize privacy leakage, we 
pay a price in terms of the security of the individual systems. 

C. Authentication Attacks with Multiple Systems 

In situations where some subset of systems based on the 
same biometric have been compromised, an attacker may be 
able to use the compromised data to enhance his ability to 
authenticate falsely. The following theorem states results on 
the successful attack rates for our two-factor secure biometric 
systems. The theorem is proved in Appendix |F] 

Theorem 5 Let u noisy, non-identical enrollments be gener- 
ated for a secure two-factor biometric system (fuzzy commit- 
ment or secure sketch). Consider any system j £ {1, . . . , u}. 

(i) If either Sj G Vj or both Aj G A and G Vj, then 

PsA{j,V,A) = l. 

(ii) If Kj G Vj and for some i ^ j, Ai G A and pi < pj 
then 

PsAU,V,A)>l-PFRij). 

(iii) If Vj = {}, the null set, then 

PsA{j,V,A) = PFAU)- 

(iv) // Sj ^ Vj, A = {}, and Vi C {S^, KJ for each i ^ j 
then 

PsA{j,V,A) = PFAU)- 

In part (i), an adversary who has access to the stored data 
of the target system can easily find a low-weight element of 
the coset corresponding to Sj, yielding access with probability 
one as per Theorem [T[v)j^ 

In part (ii), the adversary has access to the key of the 
system to be attacked and at least one enrollment biometric 
of some other system A^ or the ground truth biometric Aq. 
In these settings we show that the adversary can use this data 
to imitate a probe biometric of the legitimate user and launch 
an authentication attack with a high probability of success. 

In contrast, in parts (iii)-(iv), the adversary cannot do better 
than the nominal false acceptance rate. In part (iii), neither the 
key nor the stored data of the target system are compromised, 
but the Vi for i j can be arbitrary. Then, because 
is independent of all other parts of the system, the attacker 
cannot improve his probability of success over that of random 
guessing. In part (iv), the key of the target system may be 
compromised, but in all other systems only a strict subset 
of the data is compromised (either just the stored data, just 
the key, or neither) and, further, no enrollment biometrics are 
compromised. In this situation the adversary is again not able 
to authenticate with probability higher than the FAR. 

The following theorem considers the effect of the joint 
structure of the parity check matrices employed on different 
access control devices on the probability of successful attack. 

^Note that in this part if only Sj is leaked, but not Kj, then this is a 
revocable scenario, i.e., the old Kj can be revoked and a new key assigned. 
Until this is done, however, the probability of successful attack is one, as 
given above. Once Kj is revoked, the probability of successful attack becomes 
PfaU)- 
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It establishes that if certain joint structure is present, the 
adversary can leverage this structure to improve dramatically 
the Ukelihood of being able to falsely authenticate on uncom- 
promised systems. The theorem is proved in Appendix |G] 

Theorem 6 Given the enrollment model of Q, assume the 
two-factor systems are ordered such that there is some index 
I, \ < I < u such that Vi = {S^jK^} for all i < I and 
Vi — Ki for all i > I. Let A~{\. Now, consider any system 
index j > I -\- 1. 

(i) For noiseless enrollments rank(Hi, . . . , H;, Hj) = 
rank(Hi, . . . , H/) then 

PsA{j.V,A) = l. 

(ii) For noisy enrollments if rank(Hi, . . . , H;, Hj) = 
rank(Hi, . . . , H/) and for all < i < u, pi < aj, 
where aj is the crossover probability of system- j 's probe 
channel, then 

PsAij,V,A) >1-Pfr{j). 

(iii) // rank(Hi,...,H,,Hj) = rank(Hi, . . . , H,) + 
rank(Hj) then (in either the noisy or noiseless case) 

PsAij,V,A)=PFA{j). 

The conditions in the first two parts of Thm. [6] mean that 
the row space of Hj lies within the span of the rows of 
Hi, . . . , H/. In this situation, an attacker can gain access with 
high probability. In contrast, if the parity check matrix 
used to define the stored data in the system under attack is 
linearly independent of the matrices defining the compromised 
systems, then the compromised data is useless in attempts to 
improve the successful attack rate beyond the nominal false 
acceptance rate of the system. 

To build intuition, we study the implications of Thm. [6] 
through a sequence of examples. In keeping with our previ- 
ous development, we consider secure sketch-based biometric 
systems, though the results translate to fuzzy commitment- 
based access control devices as well. In each example we 
consider three biometric systems, u — 3. The three enrollment 
matrices Hi, H2, H3, are each of size m x n and full rank 
m where n = 3m. We consider an adversary that is trying to 
authenticate with respect to system #3, having gained access 
to all data except S3, i.e., V = {Si, Ki, S2, K2, K3}. In 
some of the examples, we will find it useful to refer back 
to Lemma [T] which relates linear independence between the 
rows of the parity check matrices to statistical independence 
of the syndromes HjAj. 

Example 1 (noiseless enrollments) Consider noiseless enroll- 
ments, Aq = Ai = A2 = A3 and H3 = Hi H2. In this 
setting, using the elements of V, the adversary can calculate 
the stored data of the third system as S3 = Si S2 ® Ki 
K2 ® K3. Picking C (= D) such that H3C = S3 and setting 
J = L = the all-zeros syndrome, the adversary can force 
the decoder to the coset containing the all-zeros vector. Recall 
that the decoder looks for the lowest weight vector in the set 
H3D © S3 © L. The probability of success of this attack is 
one. 



Example 2 (identical enrollment functions) Consider the set- 
ting where Hi = H2 — H3. If enrollments are noiseless then, 
e.g., S3 — Si ©Ki ©K3 and the attack of Example [T] works, 
allowing the adversary to successfully access system #3 with 
probability one. In fact compromising the stored data and key 
of any single system will allow an attacker to access any other 
system whose key is compromised with probability one. If 
enrollments are noisy but pi — P2 = P3 then Si ©Ki specifies 
a coset that contains a vector close to Aq. Pick any element 
of this coset as D and use K3 for L. These choices will yield 
the same probability of successful attack as a legitimate probe 
generated from Aq, i.e., at least 1 — Pfr- 

Example 3 (linearly independent enrollment functions) Now 
consider the case when the rows of Hi, the rows of H2, and 
the rows of H3 are all linearly independent of one another. 
Then, by Lemma [T[ whether or not enrollments are noisy, the 
information about the biometric leaked by the compromised 
data is independent from S3. Hence, the compromised data 
does not enhance the adversary's ability to authenticate falsely. 

Example [3] suggests that a cross-system design of the 
codes, i.e.. Hi, . . . , H„, that minimizes the linear dependence 
between parity check matrices can obviate the danger of 
linkage attacks. However, it is not always possible to design 
fully independent parity check matrices while maintaining 
the desired full rank of each. This is due to dimensionality 
restrictions. In the examples, m — n/3. Thus, if we added 
another biometric system, i.e., u = 4, maintaining full linear 
independence is not possible. 

Example 4 (partially linearly dependent enrollment func- 
tions) Theorem [6] considers the two extreme cases of lin- 
ear dependence between the parity check matrix Hj of the 
system under attack and those of the compromised systems. 
Hi , . . . , H; . Full linear dependence is considered in parts (i) 
and (ii) of the theorem, and full linear independence in part 
(iii). In this example we consider an intermediate scenario of 
partial linear dependence. 

In particular, let Hq, Hf,, He, 'Hd be full -rank m/2 x n 
matrices where all of the rows are linearly independent. Let 
nl - [H^H^, - [H^H^], and = [H^Hj]. 
Again let V = {Si, Ki, S2, K2, K3}. The first half of the 
vector S3©K3 equals Ha A, which, for noiseless enrollments, 
is the same as the first half of the Si©Ki and S2©K2 vectors, 
both of which can be calculated from the stored information. 
However, by Lemma [T] the second half of the S3 © K3 vector 
is statistically independent of all compromised data. 

We now describe a natural attack on the system descried in 
Example]?] First note that, in the same manner as in the earlier 
examples, the attacker can set the first half of the syndrome 
arbitrarily. One attack would be to pick these m/2 constraints 
to eliminate as few low-weight sequences as possible. Ideally, 
these constraints would be picked so that, regardless of the 
remaining m/2 bits of the syndrome, each possible coset 
(after all m syndrome bits are set) would contain at least 
one low-weight sequence (i.e., a sequence with fewer than 
rn ones). Whether such an attack is possible depends on the 
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specific Ha and matrices. One should note that low-weight 
sequences are not uniformly distributed over the cosets of 
H^. This means that even determining whether such an attack 
is possible for specific Ha and H^ matrices is likely quite 
computationally challenging. These considerations illustrate 
the difficulty of determining the SAR in these settings. At 
a minimum, we can say that the SAR must be at least as large 
as the FAR. This follows since the attacker can make the SAR 
equal to the FAR simply by ignoring the compromised data 
and setting all m syndrome bits at random. 



D. Formulation of ECC Design Problem for Multiple Systems 

In the previous two subsections, we analyzed information 
leakage and authentication attacks when an adversary has com- 
promised multiple enrollments based on the same underlying 
biometric. Theorems |4] and |5] tell us that unless there are 
fully compromised systems, no information is leaked about 
the underlying biometric and there is no way to improve 
the probability of successful attack beyond the nominal false 
acceptance rate. Thus, in cases of only partial data compromise 
two-factor designs are secure to linkage attacks and can be 
revoked. 

One way to view these results is from the perspective of 
reusability. A set of access-control systems can be thought of 
as a series of re-enrollments established after successive data 
compromise. If any one element - but not both - of the stored 
data Si and key K; are lost, the user can simply destroy the 
other and regenerate a fresh (S^+i, K^+i) pair. The previous, 
partially compromised, enrollments do not cause any privacy 
leakage nor do they enhance the adversary's ability to attack 
the newly enrolled system. 

Furthermore, from Theorem [6] we learn that, in general, the 
effectiveness of linkage attacks depends on the joint structure 
of the error-correcting codes deployed. Furthermore, Examples 
|2]^in particular give hints as to how the collection of systems 
can be jointly designed to mitigate the amount of privacy 
leakage or minimize the successful attack rate when some 
systems have been fully compromised. We observe from the 
examples that there is a natural tradeoff between robustness 
to privacy leakage and robustness to authentication attacks. 
Linear dependence between parity check matrices results in 
an increased probability of successful attack while linear 
independence results in increased privacy leakage. We now 
present a design formulation that formalizes this tradeoff. 

Our objective is to design u parity-check matrices 
Hi, ... , H„, all full-rank m x n matrices to optimize certain 
properties. To define these properties we consider all (^) 
cardinality-L subsets of the parity-check matrices. Denote the 

such subset as Si for 1 < I < (^). The parameter L corre- 
sponds to the number of biometric systems that the adversary 
can potentially compromise, and the subset Si represents one 
set of systems that adversary may have compromised. For any 
subset Si, we define Si [i] to be the index of the parity check 
matrix in the subset. That is 1 < i < L and 1 < Si [i] < u. 
Further (with some abuse of notation) we define H^^ to be 
the Lm x n matrix formed by "stacking" all matrices in the 



subset into a single matrix, i.e.. 



He 



P5,[l] 



L 



We use ri to denote the collective rank of the /* stacked matrix 
defined as 

n = rank(H5j. 

The collective rank is bounded by 1 < r; < min{Zm, n} 
and is the privacy leakage when the adversary has gained 
access both to the key and to the stored data of the L systems 
in Si. Theorem |4] establishes that for noiseless enrollments, 
the stacked rank is exactly equal to the privacy leakage, and 
that for noisy enrollments, the stacked rank provides an upper 
bound on the privacy leakage. 

Now, for each subset Si and system j G {1, . . . , u}, define 
the residual rank of matrix H, as 



ti. 



rank(H5, , Hj 



Note that t; = if the row-space of Hj is spanned by the 
rows of H5,, which would happen automatically if j e Si. 
Also, < ti j < m, with equality to m if all rows of Hj are 
linearly independent of the rows of H^,. The residual rank 
parameter provides a loose characterization of the systems' 
linkage attack resistance to authentication attacks. Consider an 
adversary that has compromised the keys and stored data of the 
enrollments of the systems in Si. When tij — m, the adversary 
does not benefit from a higher probability of successful attack 
for system j. On the other hand, when t; ^ — and the key of 
system-j is compromised, the adversary will be able to falsely 
authenticate at system j with probability one if the enrollments 
are noiseless, and with high probability even if the enrollments 
are noisy. For intermediate values of ti j, determining the 
corresponding linkage resistance against authentication attacks 
is complicated as was discussed in Example|4]of Section IV-C 
Thus the parameter ti_j is a rough measure of linkage attack 
resistance. However, for noiseless enrollments, <( j provides a 
lower bound on the corresponding SAR given by 

where V are the keys and stored data for the systems in Si. 
This is because uniformly sampling from one of the 2*' ^ cosets 
containing the enrolled biometric is always a strategy that is 
available to the attacker. 

When designing a collection of systems, roughly speaking, 
minimizing r; corresponds to reducing privacy leakage while 
maximizing ti j corresponds to reducing the probability of 
successful attack. The system designer must not only choose 
matrices with desirable error-correcting properties but also 
consider the optimization of these parameters across different 
values of I, j, and L. One possible approach is to use the 
following pessimistic performance measures, rmax and imin, 
which are respectively defined as 

r-max := max r/, 
i<'<(^) 



:— min min t 



i< 



i<{l) jesf 
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where we note that the optimizing I may not be the same for 
both measures. The design of a set of parity check matrices 
that yield low FRRs, while minimizing rmax and maximizing 
tmin appears to be a challenging avenue for future research. 

V. Conclusions 

In this paper, we presented a generalized framework for 
modeling secure biometric systems and characterizing their 
security and privacy properties. We conducted a detailed 
information-theoretic analysis of two related types of systems 
based on linear error correcting codes, namely secure sketch 
and fuzzy commitment. We also considered two variants of 
each scheme: keyless and keyed. The second is a two-factor 
scheme in which the biometric system is augmented by a 
secret key held on a smart card. We showed that secure sketch 
and fuzzy-commitment systems are equivalent in terms of the 
false rejection rate, false acceptance rate, successful attack 
rate, and privacy leakage during partial or full compromise 
of biometric templates and smart-card keys. We did, however, 
find a difference in their storage requirements with secure 
sketch requiring less storage. 

In either keyless or two-factor schemes, compromising the 
stored data renders the biometric system vulnerable to attack. 
If the data stored on the device is lost, an adversary can gain 
access to the system with probability one. However, for a two- 
factor system the user's biometric sample remains protected 
(the information-theoretic privacy of the user is maintained) 
so long as the secret key is not compromised. In this scenario, 
the enrollment can be revoked and a new one established. If, 
however, both the stored data and the key are compromised, 
the two-factor scheme is no worse than a keyless scheme. 

We also analyzed the information leakage and authenti- 
cation performance when a user's biometric is enrolled at 
several access control devices. We studied the repercussions 
of data compromise in a subset of the systems. For two- 
factor schemes, the successful attack rate is no larger than 
the nominal false acceptance rate of the system so long as no 
single system suffers from a theft of both the stored data and 
smart card key. Furthermore, no information is leaked about 
the user's biometric in this case. 

When some subset of systems is fully compromised, i.e., 
both the stored data and the secret key are compromised, we 
showed that the information leaked about the user's biometric 
depends on the rank of a matrix formed by stacking the parity 
check matrices of the compromised devices. The successful 
attack rate in this scenario depends on the design of the parity 
check matrices of the compromised devices, specifically on 
the number of independent rows in these matrices. We showed 
via examples that, while designing multiple biometric systems, 
there exists a fundamental tradeoff between the user's privacy, 
i.e., the information leaked about the underlying biometric, 
and the user's security, i.e., the probability that the adversary 
can falsely authenticate as a genuine user. 

Many interesting problems remain open. Most importantly, 
in our opinion, is the situation of multiple fully-compromised 
systems. Providing the complete characterization of the trade- 
off between privacy leakage and probability of successful 



attack in this setting is elusive. Such a characterization would 
provide guidelines for the design of the parity check matrices 
for the constituent systems. Even with such a characterization, 
the joint design of parity check matrices to achieve a point on 
that optimum tradeoff curve will be a challenge. 
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Appendix A 
Proof of Lemma[T] 



By Bayes' Theorem 

Pr[HA = s|HA = s] 



Pr[HA = s, HA = s] 
Pr[HA = si 







H 




s 




Pr 




H 


A = 


s 





Pr[HA = s] 

Since H is full rank, all 2™ length-m possible syndrome 
vectors s are reachable by different choices of A. Also, by 
the theorem of Lagrange, all cosets are of equal size. Thus, 
since all realizations of A are equally likely, HA is uniformly 
distributed, i.e., Pr[HA = s] = 2"". Since H and H 
are linearly independent, the matrix [H^H^] has full rank. 
Therefore, by the same logic as before, the numerator is equal 
to 2^(™+™). 



Appendix B 
Proof of Theorem[T] 

In the paragraphs preceding the statement of Theorem [T] it 
was proved that the FRR and FAR of both keyed and keyless 
variants of fuzzy commitment and secure sketch are the same. 

(i) The FRR is given by 

Pfr = Pr [d(W) > rn] , 
where, since for the legitimate user D = B and L = K, 
W= argmin d{W). 

W:HW=H(AeB) 

The FRR can be bounded by 

Pfr - Pr [d(W) > T71, W = A ® B] 
+ Pr [d(W) > rn, W 7^ A ® B] 
< Pr [d(A ® B) > rn] + Pr [W 7^ A ® B] . 

The decoding procedure to produce W is operationally equiv- 
alent to the optimal syndrome decoding of A from the noisy 
version B, since 

W — arg min d{yV) 

W:HW=H(AffiB) 

= B® argmin c?(A'®B). 

A' :HA'=HA 

Thus, the probability that W fails to recover A0B is equal to 
the probability of decoding error of the code, which is bounded 
by 

Pr [W 7^ A ® B] < 2-"-E(-")+o(n)^ 

The probability that A © B fails the threshold test can be 
bounded by the Chernoff-Hoeffding bound ETIl . 

Pr [d{A ® B) > rn] < 2-"-°(^IIp). 

Combining these two bounds yields the bound on the FRR. 

(ii) As discussed in the paragraphs preceding the statement 
of Theorem [T] in both the keyed and keyless variants of both 
fuzzy commitment and secure sketch systems, regardless of 



the distribution of (C, J), the syndrome is i.i.d. Bernoulli(0.5). 
Since H has full row rank, this implies that all syndromes, or 
equivalently all cosets, are equally likely to be selected with 
probability 2~™ (there are 2™ cosets). Since PpA is equal 
to the probability of selecting a coset whose coset-leader (the 
minimum Hamming weight word in the coset) has a Hamming 
weight not more than rn and the number of such cosets is 
not more than the total number of sequences in {0, 1}" with 
Hamming weight less than rn, it follows that 

Pfa < 2-"|{w : d(w) < rn}| 

rn 

= 2-™^|{w:d(w)=z}| 

rn / 



= 2 ^ 

i=0 

^ i2 — m2^nhb{T) 

where second inequality above is due to f22^, Lemma 8, Ch. 
10] since r < 0.5. 

(iii) In both of the two-factor systems, an adversary with 
knowledge of only K submits attack vectors (C, J) that are 
independent of A. Hence, the distribution of the syndrome is 
Bemoulli(0.5), as in the FAR analysis, and thus 

Psa{K) = Pfa- 

(iv) An adversary with knowledge of only A, submits attack 
vectors (C,J) that are independent of K. Hence again the 
distribution of the syndrome is still Bernoulli(0.5), and thus 

Psa(A) = Pfa- 

(v) Recall that gss(D,L, S) = HD ® L ® S and 
gpc(D,L,S) = H(D ® L © S). With knowledge of S, 
an adversary can choose C = and J = S to make 

(D, L, S) = (D, L, S) = so that W = and system 
authenticates the adversary. Thus, 

Psa{S) = 1- 

Since Psa{Vi,V2) > Psa{Vi), we also have 

Psa{S) = Psa{A,S) = Psa{K,S) = Psa{A,K,S) = 1. 

In a similar manner, one can show that with knowledge of both 
A and K, an adversary can set the syndrome to any desired 
value and thus, 

Psa(A,K) = 1. 

(vi) As in the proof of part (v), in the keyless versions of the 
fuzzy commitment and secure sketch systems, an adversary 
with knowledge of S alone or A alone can set the syndrome 
to a value that makes the decoder select a coset with a low- 
weight sequence with probability one. Hence, 

Psa{S) - Psa(A) =. 1. 

Finally, since V2) > Psa(Vi), we also have 

Psa{A,S) = 1. 
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Appendix C 
Proof of Theorem|2] 

(i) Let Sa C S denote the subset for which there exist D, L 
such that g(D,L,S) = 1. If S ^ Sa, then ^ = 0. Therefore, 
the FRR must be bounded by 

Pfb. > Pr [S ^ Sa] . 

Since the adversary can gain access (with probability one) 
when S e Sa, the SAR can be bounded as 

P5a(S) >Pr [SeSa] >1-Pfr. 

(ii) If Sa = S, then the adversary can always choose C and 
J such that 9 = g{T>, L, S) = 1 in order to gain access with 
probability one. 



Appendix D 
Proof of Theorem[3] 

In the two-factor fuzzy commitment scheme, A, K, and Z 
are mutually independent Bernoulli(0.5) sequences and S — 
A © G^Z © K. In the two-factor secure sketch scheme, A 
and K are mutually independent Bemoulli(0.5) sequences and 
S = HA © K. Thus for both two-factor schemes, A and K 
are mutually independent and so are A and S. This implies 
that 

/(A;S) =/(A;K) = 

for both two-factor fuzzy and two-factor secure sketch 
schemes. 

For the two-factor fuzzy commitment scheme, 

/(A;S,K) = H{S,K) - H{S,K\A) 

= H{K) + H{S\K) - H{K\A) - H{S\K, A) 
= H{K) + H{A © G^Z) - H{K) - H{G^Z) 

^ n — k = m. 

For the two-factor secure sketch scheme, 

/(A;S,K) =i?(S,K)-i7(S,K|A) 

= H{K) + H{S\K) - H{K\A) - iJ(S|A,K) 
= H{K) + (HA) - H{K) - 
= i7(HA) = m. 

In the keyless fuzzy commitment scheme, 

/(A;S) = H{S)^H{S\A) 

= H{A © G^Z) - H{A © G^Z|A) 
= H{A) ~ H{G^Z) 

= 71 — k — m. 

And finally, in the keyless secure sketch scheme, 

/(A; S) = i?(S) - H{S\A) = H{S) = m. 



Appendix E 
Proof of Theorem|4] 

To yield the main result, we show that 

/(Ao;Vi,...,V„) 

^=^/(Ao;Vi,...,VO 

^=^/(Ao;Ki,...,K,,HiAi,...,H,AO 

'=^/(Ao;HiAi,...,HiAO. 

Each step is justified by the following arguments: 

(a) is due to the chain rule for mutual information since 

(Vi+i,...,V,)X(Ao,Vi,...,V,). 

(6) since Vi, . . . ,Vi is informationally equivalent to 
Ki, . . . , K;, HiAi, . . . , H/A;. For the secure sketch 
system, the equivalence is immediate since for 
I e {1,...,^}, V, = (S„KO = (H,A,,K,). 
To show the information equivalence for 
the fuzzy commitment system, note that for 
i e V, = (A, © GTZ„K,). Since 

(HiAijKi) is a function of Vi, the information 
processing inequality gives /(Ag; Vi, . . . , V;) > 
/(Aq; Ki, . . . , Ki, HiAi, . . . , H;A;). But the informa- 
tion processing inequality also gives /(Aq; Vi, . . . , Vj) < 
/(Ao;Ki, . . . ,K(,HiAi, . . . ,H;A,) since 
Ao - (Ki,...,K,,HiAi,...,HzAO 
(Ki,...,K,,Ai © GjZi,...,Ai © G^Z,) forms 
a Markov chain. This is because A^ © G^'Z,; is a 
codeword that is independently chosen from the coset 
corresponding to H^A^. 

(c) is due to the chain rule for mutual information since 
(Ki,...,KOX(Ao,HiAi,...,H,AO. 
To prove parts (i) and (ii) of Theorem |4] we continue as 

/(Ao;HiAi,...,H,AO 
</(Ao;HiAo,...,H,Ao) 

= i/(HiAo, . . . , H, Ao) - H{U,Ao, H,Ao|Ao) 
= i?(HiAo,...,H,Ao) 
= rank(Hi, . . . ,H/). 

The inequality is due to the information processing inequality 
and the fact that Aq - HiAq, . . . , H^ Aq — HiAi, . . . , H;A; 
forms a Markov chain. This inequality holds with equality 
for noiseless enrollments. The last equality follows from 
Lemma [U 

Appendix F 
Proof of Theorem|5] 

(i) This is an immediate corollary of Theorem [T]^v): Similar 
to the single system case, knowledge of Sj or {Aj , Kj ) - 
which is sufficient to generate Sj - allows the adversary 
to authenticate with probability one. 

(ii) The adversary can set the attack vectors C to A^ and J to 
Kj. The authentication attack succeeds with probability 
at least as large as 1 — PfrU) since, for < Pi < Pj < 
0.5 and < a < 0.5, the noise level between Aj and A^ 
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can only be lower than the noise level between Aj and 
a legitimate probe biometric B. 

(iii) The compromised data V is independent of Sj since 
it does not contain Kj, which is independent and i.i.d 
Bernoulli(0.5). Hence, any attack vectors C and J would 
be independent of Sj and result in a uniformly distributed 
decoding syndrome qss{Sj, C, J) according to (j2]i. This 
results in a probability of successful attack equivalent to 
the probability of false accept. 

(iv) Similar to part (iii) above, the compromised data V is 
independent of Aj, which is itself i.i.d Bernoulli(0.5). 
Hence, any attack vectors C and J result in a uniformly 
distributed decoding syndrome qgs{Sj, C, J) and a prob- 
ability of successful attack equal to the probability of 
false accept. 

Appendix G 
Proof of Theorem|6] 

(i) For i < I, since both Si and are compromised, the 
syndrome H^Ai is known to the adversary. In the case 
of noiseless enrollments, H^Ai = H^Ao. The linearly 
dependent rows of Hj allow HjAo to be determined 
as a function of the compromised data. The stored data 
for system j can be recovered as Sj ~ Kj ® HjAj = 
Kj©Hj Aq. By Theorem[5|i), the adversary can therefore 
falsely authenticate with system j with probability one. 

(ii) Since rank(Hi , . . . , H; , Hj ) — rank(Hi , . . . , H; ), each 
row of Hj can be expressed as a linear combination of 
the rows of {Hi, . . . , H,}. Let H^ = Mj[Hf . . . Tiff 
where Mj is an nij x (mi + ... + mi) matrix of 
coefficients. Suppose that the attacker chooses the at- 
tack vector pair (C, J), cf. Fig. |2j such that HjC — 
Mj[(HiAi)^ . . . (H/A;)^]^ (this can always be done) 
and J = K j . Then, the syndrome formed in the authenti- 
cation (decoding) step of the j-th two-factor secure sketch 
system would be 

H,C + H,A, 

= H,C + M,[(HiA,)^...(H,Aj)^]^ 

= M,[(Hi(Ai + A,))^ . . . (H;(A, + A,))^]^. 

If instead D = Bj, where Bj is a legitimate probe vector 
for system- j/, and L = Kj, then the syndrome formed in 
the authentication (decoding) step of the j-th two-factor 
secure sketch system would be 

Hj(B,+A,)=M,[(Hi(B,+A,))^...(Hi(B,+A,))^]^ 

Since for all < i < w, the enrollment channel crossover 
probability pi < aj, the probe channel crossover proba- 
bility, each Ai is a less "noisy" version of Aq than Bj. 
Thus, the probability of system-j rejecting the specified 
attack vectors cannot be more than the probability that a 
legitimate probe vector Bj is rejected (given by PFB.{j))- 
Thus the authentication attack will succeed with a prob- 
ability which is at least 1 — PpRii)- 

(iii) When the rows of Hj are linearly independent, the syn- 
drome Sj = HjAj is independent of the compromised 
data due to Lemma [T] Another way of seeing this is to 



consider the authentication procedure in Section |III-B| 
Using similar notation, the adversary seeks a W such 
that 

W ~ arg min d(W) . 

WiHjW^HjDeSj 

where the adversary synthesizes D as a function of H^A^, 
i = 1,2,...,Z. But, since the rows of Hj are linearly 
independent of the rows of Hi, H2, H;, the decoding 
syndrome Sj of the target system remains independent 
and uniformly distributed for any choice of D made by 
the adversary based on the compromised data. Hence, the 
probability of successful attack is no larger than the false 
acceptance rate. 



